Hierarchical clustering with discrete latent variable models and the integrated classification likelihood
نویسندگان
چکیده
Finding a set of nested partitions dataset is useful to uncover relevant structure at different scales, and often dealt with data-dependent methodology. In this paper, we introduce general two-step methodology for model-based hierarchical clustering. Considering the integrated classification likelihood criterion as an objective function, work applies every discrete latent variable models (DLVMs) where quantity tractable. The first step involves maximizing respect partition. Addressing known problem sub-optimal local maxima found by greedy hill climbing heuristics, new hybrid algorithm based on genetic efficiently exploring space solutions. resulting carefully combines merges solutions, allows joint inference number $K$ clusters well themselves. Starting from natural partition, second bottom-up procedure extract hierarchy clusters. Bayesian context, achieved considering Dirichlet cluster proportion prior parameter $\alpha$ regularization term controlling granularity A approximation derived log-linear function $\alpha$, enabling simple functional form merge decision criterion. This exploration clustering coarser scales. proposed approach compared existing strategies simulated real settings, its results are shown be particularly relevant. reference implementation available in R package greed accompanying paper.
منابع مشابه
Building Blocks for Hierarchical Latent Variable Models
We introduce building blocks from which a large variety of latent variable models can be built. The blocks include continuous and discrete variables, summation, addition, nonlinearity and switching. Ensemble learning provides a cost function which can be used for updating the variables as well as optimising the model structure. The blocks are designed to fit together and to yield efficient upda...
متن کاملClustering Mixed Data via Latent Variable Models
A model based clustering procedure for data of mixed type, termed clustMD, is developed using a latent variable model. It is proposed that a latent variable, following a mixture of Gaussian distributions, generates the observed data of mixed type. The observed data may be any combination of continuous, binary, ordinal or nominal variables. The model employs a parsimonious covariance structure f...
متن کاملLeveraging the Exact Likelihood of Deep Latent Variable Models
Deep latent variable models combine the approximation abilities of deep neural networks and the statistical foundations of generative models. The induced data distribution is an infinite mixture model whose density is extremely delicate to compute. Variational methods are consequently used for inference, following the seminal work of Rezende et al. (2014) and Kingma & Welling (2014). We study t...
متن کاملthe clustering and classification data mining techniques in insurance fraud detection:the case of iranian car insurance
با توجه به گسترش روز افزون تقلب در حوزه بیمه به خصوص در بخش بیمه اتومبیل و تبعات منفی آن برای شرکت های بیمه، به کارگیری روش های مناسب و کارآمد به منظور شناسایی و کشف تقلب در این حوزه امری ضروری است. درک الگوی موجود در داده های مربوط به مطالبات گزارش شده گذشته می تواند در کشف واقعی یا غیرواقعی بودن ادعای خسارت، مفید باشد. یکی از متداول ترین و پرکاربردترین راه های کشف الگوی داده ها استفاده از ر...
Sequential Dynamic Classification Using Latent Variable Models
Adaptive classification is an important online problem in data analysis. The nonlinear and nonstationary nature of much data makes standard static approaches unsuitable. In this paper, we propose a set of sequential dynamic classification algorithms based on extension of nonlinear variants of Bayesian Kalman processes and dynamic generalized linear models. The approaches are shown to work well ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Advances in data analysis and classification
سال: 2021
ISSN: ['1862-5355', '1862-5347']
DOI: https://doi.org/10.1007/s11634-021-00440-z